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Abstract. In some estimation problems, especially in applications deal- 
ing with information theory, signal processing and biology, theory pro- 
vides us with additional information allowing us to restrict the param- 
eter space to a finite number of points. In this case, we speak of dis- 
crete parameter models. Even though the problem is quite old and has 
interesting connections with testing and model selection, asymptotic 
theory for these models has hardly ever been studied. Therefore, we dis- 
cuss consistency, asymptotic distribution theory, information inequali- 
ties and their relations with efficiency and superefficiency for a general 
class of m-estimators. 
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1. INTRODUCTION 

Sometimes, especially in applications dealing with 
signal processing and biology, theory provides us 
with some additional information allowing us to re- 
strict the parameter space to a finite number of 
points; in these cases, we speak of discrete parame- 
ter models. Statistical inference when the parameter 
space is reduced to a lattice was first considered by 
Hammersley [33] in a seminal paper. However, since 
the author was motivated by the measurement of 
the mean weight of insulin, he focused mainly on 
the case of a Gaussian distribution with known vari- 
ance and unknown integer mean (see [33], page 192); 
this case was further developed by Khan [46-49]. 
The Poisson case also met some attention in the lit- 
erature and was dealt with by Hammersley ([33], 
page 199) and others [61, 75]. 
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Previous works have shown that the rate of con- 
vergence of m-estimators is often exponential [[33, 
80], [82, 83]]. General treatments of admissibility 
and related topics are in [28, 38, 62, 73] (see also the 
book [9]); special cases have been dealt with in [44] 
(page 424, for the case of a translation integral pa- 
rameter and of integral data under the quadratic 
loss), [29, 33, 46-49] (for the case of the Gaussian 
distribution) and [11] (for the case of the discrete 
uniform distribution) . Other papers dealing with op- 
timality in discrete parameter spaces are [27, 78, 
79, 81, 84]. Optimality of estimation under a dis- 
crete parameter space was also considered by Va- 
jda [80, 82, 83] in a nonorthodox setting inspired by 
Renyi's theory of random search. Other aspects that 
have been studied are Bayesian encompassing [24], 
construction of confidence intervals ([19], pages 224- 
225), comparison of statistical experiments ([77], [56], 
Section 2.2), sufficiency and minimal sufficiency [54] 
and best prediction [76]. Moreover, in the estima- 
tion of complex statistical models (see [31], [18], 
Chapter 4) and in the calculation of efficiency rates 
(see [1, 15, 56]), approximating a general parameter 
space by a sequence of finite sets has proved to be 
a valuable tool. A few papers showed the practical 
importance of discrete parameter models in signal 
processing, automatic control and information the- 
ory and derived some bounds on the performance of 
the estimators (see [3-6, 34-36, 52, 53, 58]). More 
recently, the topic has received new interest in the 
information theory literature (see [43, 69], and the 
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review paper [37]), in stochastic integer program- 
ming (see [25, 50, 86]), and in geodesy (see, e.g., [76], 
Section 5). 

However, no general formula for the convergence 
rate has ever been obtained, no optimality proof 
under generic conditions has been provided and no 
general discussion of efficiency and superefficiency in 
discrete parameter models has appeared in the liter- 
ature. In the present paper, we provide a full answer 
to these problems in the case of discrete parameter 
models for samples of i.i.d. (independent and iden- 
tically distributed) random variables. Therefore, af- 
ter introducing some examples of discrete parame- 
ter models in Section 2, in Section 3 we investigate 
the properties of a class of m-estimators. In partic- 
ular, in Section 3.1, we derive some conditions for 
strong consistency; then, in Section 3.2, we calculate 
an asymptotic approximation of the distribution of 
the estimator and we establish its convergence rate. 
These results are specialized to the case of the max- 
imum likelihood estimator (MLE) and extended to 
Bayes estimators in Section 3.3. In Section 4, we de- 
rive upper bounds for the convergence rate in the 
standard and in the minimax contexts, and we dis- 
cuss the relations between information inequalities, 
efficiency and superefficiency. In particular, we prove 
that estimators of discrete parameters have uncom- 
mon efficiency properties. Indeed, under the zero- 
one loss function, no estimator is efficient in the 
class of consistent estimators for any value of 9q £& 
{Oq being here the true value of the parameter) and 
no estimator attains the information inequality we 
derive. But the MLE still has some appealing prop- 
erties since it is minimax efficient and attains the 
minimax information inequality bound. 

2. EXAMPLES OF DISCRETE PARAMETER 
MODELS 

The following examples are intended to show the 
relevance of discrete parameter spaces in applied 
and theoretical statistics. In particular, they show 
that the results in the following sections solve some 
long-standing problems in statistics, optimization, 
information theory and signal processing. 

We recall that a statistical model is a collection 
of probability measures V = {Fg, G 0} where G is 
the parameter space. G is a subset of a Euclidean or 
of a more abstract space. 

Example 1 (Tumor transplantability) . We con- 
sider tumor transplantability in mice. For a certain 
type of mating, the probability of a tumor "taking" 
when transplanted from the grandparents to the off- 



spring is equal to (|)^ where 9 is an integer equal to 
the number of genes determining transplantability. 
For another type of mating, the probability is (^)^. 
We aim at estimating 6 knowing that ng transplants 
take out of n. The likelihood is given by 

In this case the parameter space is discrete and the 
maximum likelihood estimator can be shown to be 
^" = ni[ '"^^"°j^"^ ] where ni[x] is the integer nearest 
to X (see [33], page 236). 

Example 2 (Exponential family restricted to a lat- 
tice). Consider a random variable X distributed 
according to an exponential family where the natu- 
ral parameter 9 is restricted to a lattice {^o + 
e-N,N en''}, for fixed 9o and e (see [57], page 759). 
The case of a Gaussian distribution has been con- 
sidered in [33] (page 192) and [46, 48], the Poisson 
case in [33] (page 199), [61, 75]. In particular, [33] 
uses the Gaussian model to estimate the molecular 
weight of insulin, assumed to be an integer (how- 
ever, see the remarks of Tweedie in the discussion 
of the same paper). 

Example 3 (Stochastic discrete optimization). 
We consider the optimization problem of the form 
miiixes g(,x), where g{x) =]KG{x,W) is an integral 
functional, E is the mean under probability P, 
G{x,w) is a real- valued function of two variables x 
and w, W is a random variable having probability 
distribution P and S is a finite set. We approximate 
this problem through the sample average function 
gn{x) = ^SiLiG(x,Wj) and the associated prob- 
lem mmx£s 9nix) ■ See [50] for some theoretical re- 
sults and a discussion of the stochastic knapsack 
problem and [86] for an up-to-date bibliography. 

Example 4 (Approximate inference). In many 
applied cases, the requirement that the true model 
generating the data corresponds to a point belong- 
ing to the parameter space appears to be too strong 
and unlikely. Moreover, the objective is often to re- 
cover a model reproducing some stylized facts from 
the original data. In these cases, approximation of 
a continuous parameter space with a finite number 
of points allows for obtaining such a model under 
weaker assumptions. This situation arises, for ex- 
ample, in signal processing and automatic control 
applications [4-6, 34-36] and is reminiscent of some 
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related statistical techniques, such as the discretiza- 
tion device of Le Cam ([56], Section 6.3), or the 
sieve estimation of Grenander ([31]; see also [26], 
Remark 5). 

Example 5 (M-ary hypotheses testing and re- 
lated fields). In information theory, discrete pa- 
rameter models are quite common, and their estima- 
tion is a generalization of binary hypothesis testing 
that goes under the names of M-ary hypotheses (or 
multihypothesis) testing, classification or detection 
(see the examples in [63] ) . Consider a received wave- 
form r{t) described by the equation r(t) = m{t) + 
an{t) for t > 0, where m(t) is a deterministic sig- 
nal, n(t) is an additive Gaussian white noise and a 
is the noise intensity. The set of possible signals 
is restricted to a finite number of alternatives, say 
{mo{t), . . . ,mj{t)}: the chosen signal is usually the 
one that maximizes the log-likelihood of the sample, 
or an alternative criterion function. For example, if 
the log-likelihood of the process based on the obser- 
vation window [0,T] is used, we have 



niji-) 



1 

arg max 
j=o,...,J 0-2 



mj{t)r{t) dt 



T 



?n^(t)dt 



Much more complex cases can be dealt with; see [37] 
for an introduction. 

3. M-ESTIMATORS IN DISCRETE 
PARAMETER MODELS 

In this section, we consider an estimator obtained 
by maximizing an objective function of the form 



1 " 

Qn{0) = -y2\nq{yi]( 



i=l 

in what follows, we allow for misspecification. Note 
that the expression m-estimator stands for maxi- 
mum likelihood type estimator, in the spirit of Hu- 
ber [39], and not for maximum (or extremum) esti- 
mator (see, e.g., [64], page 2114). 

3.1 Consistency of m-Estimators 

In the case of a discrete parameter space, uni- 
form convergence reduces to pointwise convergence. 
Therefore, m-estimators are strongly consistent un- 
der less stringent conditions than in the standard 
case; in particular, no condition is needed on the 
continuity or differentiability of the objective func- 



tion. The following assumption is used in order to 
prove consistency in the case of i.i.d. replications: 

Al. The data (1^)?=! realizations of i.i.d. (2),3^)- 
valued random variables having probability mea- 
sure Pq- 

The estimator 0" is obtained by maximizing 
over the set Q = {Oq, 6*1, ... , 9j}, of finite cardi- 
nality, the objective function 



1 " 

Qn(^) = -Vln(?(y,;( 

T7. 



n 

The function q is 3^-measurable for each ^ E 
and satisfies the L^-domination condition 
EqI lng(y;0)| < +00 for every G G, where Eq 
denotes the expectation taken under the true 
probability measure Pq. 

Moreover, 9q is the point of maximizing 
Eq \nq{Y] 9) and 6*0 is globally identified (see [64], 
Section 2.2). 

Remark 1. (i) The assumption of a finite pa- 
rameter space seems restrictive with respect to the 
more general assumption of being countable (see, 
e.g., [33]). However, Al is compatible with the con- 
vex hull of being compact, as in standard asymp- 
totic theory. Indeed, the cases analyzed in [33] have 
convex likelihood functions and this is a well-known 
substitute for compactness of (see [64], page 2133; 
see [17], for consistency with neither convexity nor 
compactness). Moreover, the restriction to finite pa- 
rameter spaces seems to be necessary to derive the 
asymptotic approximation to the distribution of m- 
estimators. 

(ii) The relative position of the points of is 
unimportant and the choice of as the maximizer 
is arbitrary and is made only for practical purposes. 
Note that 9q has no link with Pq apart from being 
the pseudo-true value of In q with respect to Pq on 
the parameter space (see, e.g., [30], Volume 1, 
page 14). 

Proposition 1. Under Assumption Al, the m- 
estimator 9^ is a Fo-strongly consistent estimator 
of 9q and is y^"' -measurable. 

Remark 2. A similar result of consistency for 
discrete parameter spaces has been provided by [74] 
(page 446), by [13, 14] (pages 325-333), by [8] 
(pages 1293-1294) as an application of the Shannon- 
McMillan-Breiman Theorem of information theory, 
by [87] (Section 2.1) as a preliminary result of his 
work on partial likelihood, and by [60] (page 96, Sec- 
tion 7.1.6). 
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3.2 Distribution of the m-Estimator 

For a discrete parameter space, the finite sample 
distribution of tlie m-estimator 9^ is a discrete dis- 
tribution converging to a Dirac mass concentrated 
at ^0- Since ttie determination of an asymptotic ap- 
proximation to this distribution is an interesting and 
open problem, we derive in this section upper and 
lower bounds and asymptotic estimates for proba- 
bilities of the form Po(^" = 6'i). 

To simplify the following discussion, we introduce 
the processes: 



(1) 



Qr, 



1 



n 



1=1 



In q{yi; 6 j 



X 



[lnq{Yk;9i) 
- lnq{Yk;6j)]j=o,. 



.,J,jy^ij 



Xi. = X 



(0) 



[In q{Yk;9o)- In qiYk;e,)], 



=i,...,J) 
= 1,. 



.J, 



The probability of the estimator O'"" taking on the 
value can be written as 



(2) 



^0(^" = Oi) = Fo{Qn{0i) > Qn{9j),yj / i) 



\k=l / 



The only approaches that have been successful in our 
experience are large deviations (in logarithmic and 
exact form) and saddlepoint approximations. Note 
that we could have defined the probability in (2) as 
^oiQn{Gi) > QniPj)-,"^] / i-) or through any other 
combination of equality and inequality signs; this in- 
troduces some arbitrariness in the distribution of 
However, we will give some conditions (see Proposi- 
tion 2) under which this difference is asymptotically 
irrelevant. 

Section 3.2.1 introduces definitions and assump- 
tions and discusses a preliminary result. In Sec- 
tion 3.2.2 we derive some results on the asymptotic 
behavior of Po(^" = ^i) using large deviations prin- 
ciples (LDP). Then, we provide some refinements of 
the previous expressions using the theory of exact 
asymptotics for large deviations, with special refer- 
ence to the case J = 1. At last, Section 3.2.3 derives 
saddlepoint approximations for probabilities of the 
form (2). 

3.2.1 Definitions, assumptions and preliminary re- 
sults As concerns the distribution of the m-estima- 
tor 0", we shall need some concepts and functions 



derived from large deviations theory (see [21]); we 

recall that the processes Qn.(%)) and X^*^ have 
been introduced in (1). Then, for i = 0,...,J, we 
define the moment generating functions 

j:,=a,...,j,,^,\j-[\nq{Y-fii)-\nq{Y-fi,)h 



m(^)(a) 



the logarithmic moment generating functions 
A«(A) = lnM('')(A) 

= lnEo[eS^=o.->J..^»^^-[i"'?(^'^»)"''^''(^'^^)]] 

= lnEo[e^'^''^ 
and the Cramer transforms 



AW'*(y) 



: sup [(y. A) 



aW(A)], 



where (•, •) is the scalar product. Note that, in what 
follows, M(A), A(A) and A*(y) are respectively short- 
cuts for M(°)(A), AW_(_A) and A(°)'*(y). Moreover, 
for a function / : — )• M, we will need the definition 
of the effective domain of /, Vj = {x G E:f{x) < 
oo}. 

The following assumptions will be used to approx- 
imate the distribution of 



A2. There exists a 6 > such that, for any rj G 
{—6, 6), we have 

V 

< +00 Vj, k = 0, . . . , J. 



En 



q{Y;9k) 



Remark 3. In what follows, this assumption could 
be replaced by a condition as in [68] (Assumptions HI 
and H2). 



A3. A''*''(A) is steep, that is, lim„_ 



iaA(')(x) 

9x 



OO 



whenever {x„}„ is a sequence in int(P^(i)) con- 
verging to a boundary point of int f^co . 

Remark 4. Under Assumptions Al, A2 and A3, 
A^*^(-) is essentially smooth (see, e.g., [21], page 44). 
A sufficient condition for A3 and essential smooth- 
ness is openness of fy^f') (^^^ t^^]' P^S^ 905, and [40], 
pages 505-506). 

A4. int(M:j^ n 5(*)) / 0, where S^"^ is the closure of 
the convex hull of the support of the law of X*^*^ . 

We will also need the following lemma showing the 
equivalence between Assumption A2 and the so-called 
Cramer condition G int(P^(i) ), for any i = 0, . . . , J. 



Lemma 1. Under Assumption Al, the following 
conditions are equivalent: 



(i) Assumption A2 holds; 

(ii) G mt{V^(i)), for any i = 0, . . . , J . 

As concerns the saddlepoint approximation of Sec- 
tion 3.2.3, we need the following assumption: 
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Under Assumptions Al and A2: 
\{e'' ^9o)<H- exp 



A5. The inequality 

Ko[ n 



<il-6)- 



n 

■i=o,...,jjyj 



qiY;9i) 
q{Y;e,) 



< oo 



holds for u G int(P^{,)), 5 > and c < |t| < 
C • n^^~^y^ (i denotes the imaginary unit). 

3.2.2 Large deviations asymptotics In this section 
we consider large deviations asymptotics. We 
note that, in what follows, int(IR;:[)'^ stands for 
int{[(M+)-^]'=}. 

Proposition 2. (i) For i = 1, . . . ,J, under As- 
sumption Al, the following result holds: 



Oi) > exp 



-n ■ 



inf 



liminf °inf('") 
umimn_!>oo - 



IS 

= 0. 



yGint 

a function 



where Oinf(n) 

Oinf('' 

■OO n 

ii) Under Assumptions Al and A2: 



A«'*(y) + Oinf(n)}, 
such that 



< 



expi 



^sup(^) is 

Obup(w) _ Q 



n- inf A»'*(y)-Osup(n)|, 
a function such that 



where o. 
lim sup,„ 

(iii) Under Assumptions Al, A2, A3 and A4.- 



: exp 



|-(n + o(n))- inf A(*)'*(y)| 
yeint(]R:{^) J 

= exp(-(n + o(n)) • inf A(*)'*(y)|. 

yeK:[ J 

Proposition 3. Under Assumption Al, the fol- 
lowing inequality holds: 



Po(r /^o) >i^-exp 



-n- inf A*(y) + Oinf(n)l, 

yeint(R:'^)<= J 



where H is the finite cardinality of the set 
arginfygjjj^^jgj^c A*(y) and Oini{n) is a function such 



that lim inf „_ 



Oinf(") 



0. 



where o, 
lim sup 



sup('^) ^"5 
OBup(n) _ p 



-n- inf A*(y) - Osup(n) \, 
a function such that 



Remark 5. The proposition allows us to obtain 
an upper bound on the bias of the m-estimator, 
Bias(^") < sup,^o \0j - Oo\ ■ lPo(^" / ^o)- 

A better description of the asymptotic behavior of 
the probability Po(^"' = 6i) could be obtained, un- 
der some additional conditions, from the study of 
the neighborhood of the contact point between the 
set (M+)"' and the level sets of the Cramer trans- 
form A(*)'*(-). We leave the topic for future work. 
Here we just remark the following brackets on the 
convergence rate. 

Proposition 4. Under Assumptions Al, A2, 
A3 and A4, for sufficiently large n, the following 
result holds: 



cr 



n 



J/2 



< 



<C2- 



n 



1/2 



for i = 1, . . . , J and for some < ci < C2 < +oo . 

When J = 1 , a more precise convergence rate can 
be obtained under the following assumption: 

A6. When J= 1, there is a positive value /iG int(Py^{i 



such that 



i9A(i)(A) I 
dX 



\X=fj, 



), 

0. Moreover, the law of 
In gly'ggl is nonlattice (see [21], page 110). 

Proposition 5. Under Assumptions Al, A2, A3, 
A4 and A6, with G = {^Oi ^i} o.'f^d J = 1, we have 

Po(r = ^i) = Po(^V^o) 

gn.A(i)(/.) 



g-n-A(i).*(0) 



:i+o(i)) 



(A(i)'*)^^(0) 
27m 



(A(i).*)'(0) 

•(1 + 0(1)). 

Remark 6 . A refinement of the previous asymp- 
totic rates can be obtained using results in [2, 10]. 

3.2.3 Saddlepoint approximation In this section we 
consider a different kind of approximation of the 
probabilities Po(^" = 6ii). 
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Theorem 1. Under Assumptions Al, A2 and A5, 
for i ^0, it is possible to choose u such that, for ev- 
ery V G [(mtM:[) e '^'^y ], u"^v > and 



MO' 



■ exp n 



A»(u 



u 



aA»(u)' 



es_3(u,int] 
+ 6{u, intl 



du 

.:j;eEoX») 
:[eEoX«)], 



where 
es_3(u,int] 



int 



exp(-rau ■ y — n||y*||^/2) 
(27r/n)-^/2AV2 

s-3 



-i/2 



Qm(\/^y*) 



dy, 



El — — \^ 

m=l 

|5(u,intM:[eEoXW)| 



and V 



i9^A(')(u) 



,* ||2 



y"'"V~^y, A = |V|, Hm is the usual Hermite-Che- 
byshev polynomial of degree m, ^* denotes the sum 
over all m-tuples of positive integers {ji, . . . ,jm) sat- 
isfying Ji + • • • + Jm = X]** denotes the sum over 
all m-tuples (z^i, . . . , Um) with Ui = {uu, . . . , Udi), sat- 
isfying {vii + ■ ■ ■ + Vdi = ji + 2,i = I, . . . ,m) , and 
Ih = Vhi-\ h Vhm-, h = l,. . . ,d. Note that Q^u de- 
pends on u through the cumulants calculated at u. 

Remark 7. The main question that this theo- 
rem leaves open is the choice of the point u. Usually 
this point is chosen as a solution u of m(u) = x; 
this corresponds to a saddlepoint in k{u). [20] (Sec- 
tion 6) and [59] (page 480) give some conditions for 
J = 1; [41] (page 23) and [7] (page 153) give con- 
ditions for general J. [42] suggests that the most 
common solution is to choose x and u (x belonging 
to the boundary of [intM:j^ EqX^*)] and u solv- 
ing m(u) = x), such that for every v G [intMij^ Q 

^^Qu ] ' > 0. This is the same as a dominating 
point in [65-67]; therefore, A2, A3 and A4, for suffi- 
ciently large n, imply the existence of this point for 
any i. 



3.3 The MLE and Bayes Estimators in Discrete 
Parameter Models 

In this section, we show how the previous results 
can be applied to the MLE and Bayes estimators un- 
der the zero-one loss function. The MLE is defined by 



(9" = arg max fy, {yi ; Ok 



arg max 



1 " 

-5]ln/y^(2/,;0) 



This corresponds to the minimum- error-probability 
estimate of [69] and to the Bayesian estimator of 
[82, 83]. On the other hand, using the prior densities 
given by tt{0) for ^ G 0, the posterior densities of the 
Bayesian estimator are given by 



P{^fe|Y} 



S 1=0 Hi 



The Bayes estimator relative to zero-one loss 6"' (see 
Section 4.3 for a definition) is the mode of the pos- 
terior distribution and is given by 



(3) 



: argmaxlnPj^l Y| 



: arg max 



n 

-yhlfYAv^■,e) + 



n ^-^ 

1=1 



n 



Note that the MLE coincides with the Bayes es- 
timator corresponding to the uniform distribution 
^(0) = (J+l)-i for any^GG. 

Assumption Al can be replaced by the following 
ones (where Assumptions A8 and A9 entail that the 
likelihood function is asymptotically maximized at 6q 
only): 

A7. The parametric statistical model V is formed by 
a set of probability measures on a measurable 
space {Vl^A) indexed by a parameter 9 rang- 
ing over a parameter space <d = {6q,6i, . . . ,6j}, 
of finite cardinality. Let (2), 3^) be a measur- 
able space and /x a positive cj-finite measure 
defined on (2), 3^) such that, for every G 0, 
Pe is equivalent to the densities fY{Y;9) are 
3^-measurable for each 6 gQ. 

The data (^)iLi i.i.d. realizations from 
the probability measure Pq- 

A8. The log density satisfies the L^-domination 
condition Eo| In /y(y; 0i)| < +oo, for 9i G Q, 
where Eq denotes the expectation taken under 
the true probability measure Pq. 

A9. ^0 is the point of maximizing Eq In /y(y;^) 
and is globally identified. 
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In order to obtain the consistency of Bayes esti- 
mators, we need the fohowing assumption on the 
behavior of the prior distribution: 

AlO. The prior distribution verifies 7r{9) > for any 

Proposition 1 holds for the MLE under Assump- 
tions A7, A8 and A9, while for Bayes estimators AlO 
is required, too. Note that, under correct specifica- 
tion (i.e., when the true parameter value belongs 
to ©), a standard Wald's argument (see, e.g.. Lem- 
ma 2.2 in [64], page 2124) shows that Eg^ ln/y(y; 9) 
is maximized for 9 = 9q. 

As concerns the distribution of the MLE, we have 
to consider the case in which q(y; 9) is given by 
fY{y',9), Qn{9) by the log-likelihood function L„(^), 

and Xfc and by the log-likelihood processes: 
. . 1 



n 



2 = 1 



X 



(i) A 



:in/y,(yfc;^,)-ln/y,(n;%)] 



i=0,...,J,j7^j) 



^ Xfc ^ [In /y, {Yk-M - In /y, (n; ej)]j=i,...,j. 

Also M(A) and M(*)(A) are consequently defined. 
Propositions 2 and 3 hold when Assumption Al is 
replaced by Assumptions A7, A8 and A9. 

When the model is correctly specified, it is in- 
teresting to stress an interpretation of the moment 
generating function in discrete parameter models. 
We note that the moment generating functions can 
be written as follows: 



(4) 



j fY{y;9i)^^-o,...,j,j^^ 



■ n fYiy^o,)-^' 

j=l,...,J,j^i 

•/y(y;^o)'^^>(d2/). 

Therefore, in this case, the moment generating func- 
tion M^^^ (A) reduces to the so-called Hellinger trans- 
form H^{9q, . . . ,9j) (see [56], page 43) for a certain 
linear transformation of A in 7: 

H-),(6'o, ■ ■ ■ ,9j) 
J 

j=0 



.j=o 



j=0 



Moreover, due to its convexity, H^{9o, . . . , 9j) is surely 
finite for 7 belonging to the closed simplex in M'^"''^. 

Proposition 4 holds if Assumption Al is replaced 
by Assumptions A7, A8 and A9, and if A2 and A3 
hold true. However, Assumption A4 is unnecessary; 
indeed, the fact that int(M:j^n5(*)) / can be proved 
showing that G int(5^*^). This is equivalent to the 
existence, for j = 1, . . . , J, j / z, of two sets A* and A** 
of positive /x-measure and included in the support 
of Y such that, for y* £ A* and y** G A** , /y (?/* ; 9i) > 
fY{y*;9j) and fY{y**;9i) < fY{y**;9j). This follows 
easily noting that these densities have to integrate 
to 1, are almost surely (a.s.) different according to 
Assumption A9 and have the same support accord- 
ing to Assumption A7. 

In order to derive the distribution of Bayes estima- 
tors, we consider Equation (3) and we let Iutt^'^ = 



[In 



y]j=o,.. ..JjV't- Then, we can write 



lPo(^'^ 



\k=l 



G int I 



Ex. 



n 



In 



vr 



-, +00 



j=0,...,J,jf^i 

and we can use the previous large deviations or sad- 
dlepoint formulas, simply changing the set over which 
the inf is taken. However, care is needed since both 
formulas hold under the assumption 



EoXi^^ + --ln7r»Gint( 



iir. 



In the case J = 1, the similarity of these formulas 
with the corresponding ones for a Neyman-Pearson 
test is striking; this revives the interpretation of 
a Neyman-Pearson test as a Bayesian estimation 
problem. Therefore, our analysis can be seen as a (mi- 
nor) extension of the theory of hypothesis testing to 
a larger number of alternatives. 

4. OPTIMALITY AND EFFICIENCY 

In this section, we are interested in the problem of 
efficiency, with special reference to maximum likeli- 
hood and Bayes estimators. In what follows, we will 
suppose that the true parameter value belongs to 0; 
this will be reflected in the probabilities that will be 
written as Pq = P^o- Indeed, efficiency statements for 
misspecified models are quite difficult to interpret. 

In the statistics literature, efficiency (or superef- 
ficiency) can be defined comparing the behavior of 
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the estimator with respect to a lower bound or, al- 
ternatively, to a class of estimators. In the continu- 
ous case, the two concepts almost coincide (despite 
superefficiency) . However, in the discrete case, the 
two concepts diverge dramatically and we need more 
care in the derivation of the information inequalities 
and in the statement of the efficiency properties. 

An interesting problem concerns the choice of 
a measure of efficiency for the MLE in discrete pa- 
rameter models: in his seminal paper. Hammers- 
ley [33] derives a generalization of Cramer-Rao in- 
equality for the variance that is also valid when the 
parameter space is countable. The same inequal- 
ity has been derived, in slightly more generality, in 
[12, 16]. However, this choice is well-suited only in 
cases in which the MSE is a good measure of risk, for 
example, if the limiting distribution of the normal- 
ized estimator is normal. Following the discussion 
by Lindley in [33], we consider a different cost func- 
tion Ci{6,6q), whose risk function is given by the 
probability of missclassification: 



two authors in their 1951 paper, and to the latter 
as minimax Chapman-Robbins lower bound. Then, 
from these results, we derive a lower bound for the 
Bayes risk. 

4.1.1 Lower bounds for the risk function TZ\ The 
proposition of this section is intended to play the 
role of Cramer-Rao and Chapman-Robbins lower 
bounds for the variance. It corresponds essentially 
to Stein's Lemma in hypothesis testing. Moreover, 
a version of the same bound for estimators respect- 
ing (6) is provided; this corresponds to a similar re- 
sult proposed in [23] 

Proposition 6. Under Assumptions A7 and A9, 
for a strongly consistent estimator O"^ : 



We also define the Bayes risk (under the zero-one 
loss function) associated with a prior distribution vr 
on the parameter space Q. In particular, we consider 
the Bayes risk under the risk function TZi{6'^,9q) as 

J 

ri(r,7r)=J]^(^,)-Fe,(^~V%). 

j=0 

If 7r{ej) = (J + l)-i we define Pg = ri(6i", tt) as the 
average probability of error. Note that this is indeed 
the measure of error used by [82, 83]. 

Using the risk function TZi, in Section 4.1 we de- 
rive some information inequalities and we prove in 
Section 4.2 some optimality and efficiency results for 
Bayes and ML estimators. In Section 4.3 we briefly 
deal with alternative risk functions. 

4.1 Information Inequalities 

This section contains lower bounds for the pre- 
viously introduced risk function TZi. In the specific 
case of discrete parameters, these generalize and unify 
the lower bounds proposed in [16, 32, 33, 45]. 

In the following, first of all, a lower bound is proved 
and then a minimax version of the same result is ob- 
tained. When needed, we will refer to the former as 
Chapman-Robbins lower bound (and to the related 
efficiency concept as Chapman-Robbins efficiency) 
since it recalls the lower bound proposed by these 



(5) 



lim -ln7ei(r,eo) 

71— >oo n 



> sup Eg, In 
0iee\{6»(,} \jY[y ,1^1, 

On the other hand, if 

(6) limsupPe^{^"/%}<l, 

71— ^-OO 

then 

liminf-ln7^l(r,0o)> sup Ee,lnfMli|^ 

Remark 8. (i) Note that this inequality only 
holds for estimators that are consistent or respect 
condition (6), while the one of Proposition 7 holds 
for any estimator. 

(ii) Proposition 6 provides an upper bound for the 
inaccuracy rate of [45]: 

e(.,0o,n< inf - -''^^(^'^1^ 

ei€e\{eo} 



for any e small enough (e < mmQ^^Q^^Qgy \\6i — 6o\\)- 

4.1.2 Minimax lower bounds for the risk function TZi 
The following result is a minimax lower bound on 
the probability of misclassification. It is based on the 
Neyman-Pearson Lemma and Chernoff 's Bound. 

Proposition 7. Under Assumptions A7 and A9, 
for any estimator 9"": 

lim inf -In sup 7^l(^",^o) 

71^00 n eose 



(7) > sup sup In 

ei6e\{6»o}eo6e 



inf 

1>«>0 



fYiy;Oir 



/y(y;0o)'^>(dy) 
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Remark 9. (i) The previous proposition pro- 
vides an expression for the minimax Bahadur risk 
(also cahed (minimax) rate of inaccuracy; see [1, 51]) 
analogous to Chernoff 's Bound, thus providing a min- 
imax version of Remark 8(ii). 

(ii) Other methods to derive similar minimax in- 
equalities are Fano's Inequality and Assouad's Lem- 
ma (see [56], page 220); however, in the present case 
they do not allow us to obtain tight bounds, since 
the usual application of these methods relies on the 
approximation of the parameter space with a finite 
set of points G whose cardinality increases with n. 
Clearly, this cannot be done in the present case. 

(iii) Using Lemma 5.2 in [70], it is possible to 
show that the minimax bound is larger than the 
classical one. 

(iv) Under Assumption AlO, the Bayes risk ri un- 
der the risk function TZi and the prior vr respects the 
equality 

(8) lim -lnri(r,7r)= lim - lnmax7^l((9", 6*0). 

Then, Proposition 7 holds also for the Bayes risk: 
clearly this bound is independent of the prior dis- 
tribution TT (provided it is strictly positive, i.e., AlO 
holds) and also holds for the probability of error Pg . 
This inequality can be seen as an asymptotic ver- 
sion of the van Trees inequality for a different risk 
function. 

4.2 Optimality and EfFiciency 

In this section, we establish some optimality re- 
sults for the MLE in discrete parameter models. The 
situation is much more intricate than in regular sta- 
tistical models under the quadratic loss function, in 
which efficiency coincides with the attainment of the 
Cramer-Rao lower bound (despite superefficiency). 
Therefore, we propose the following definition. We 
denote by 7^ = Tl{6^, 9q) the risk function of the es- 
timator 9^ evaluated at 9q, and by G a class of es- 
timators. 

Definition 1. The estimator ^" is efficient with 
respect to (w.r.t.) G and w.r.t. TZ at 9q if 

(9) 7^(r,0o) <7^(r,eo) v^gg. 

The estimator 9^ is minimax efficient w.r.t. Q and 
w.r.t. TZ if 

(10) sup 7^(r , 9o) < sup 7^(r , ^o) G Q. 

9oee e,)6e 

The estimator 9^ is superefficient w.r.t. G and 
w.r.t. TZ if for every 0" G Q: 

n{9^,9o)<n{9^,9o) 



for every 9q £ Q and there exists at least a value 
9q£ Q such that the inequality is replaced by a strict 
inequality for 9q = 9q. 

The estimator ^ is asymptotically C^- efficient 
w.r.t. IZ at 9q if it attains the Chapman-Robbins 
lower bound of Proposition 6 at 9q [say CR— 7^(0o)] 
in the asymptotic form: 

liminf - ln7^(r , 6*0) = lnCR-7^(6'o). 

n— ^00 n 

The estimator 9^ is asymptotically minimax Co- 
efficient w.r.t. TZ if it attains the minimax Chapman- 
Robbins lower bound of Proposition 7 (say CR-7?.max) 
in the asymptotic form: 

liminf - In sup 1Z{9'', 9q) = In CR-7^max• 

The estimator 9"^ is asymptotically CK- superefficient 
w.r.t. TZ if 

liminf-ln7e(r,0o) <lnCR-7^(^o) 

n— >-oo n 

for every £ © and there exists at least a value 
G G such that the inequality is replaced by a strict 
inequality for 9o = 9q. 

Remark 10. As in Remark 8(ii), it is easy to see 
that IR-optimality and CR-efficiency w.r.t. TZi co- 
incide. 

The efficiency landscape offered by discrete pa- 
rameter models will be illustrated by Example 6. 
This shows that, even in the simplest case, that is, 
the estimation of the integer mean of a Gaussian 
random variable with known variance, the MLE does 
not attain the lower bound on the missclassification 
probability but it attains the minimax lower bound. 
Moreover, simple estimators are built that outper- 
form the MLE for certain values of the true param- 
eter value 6*0. 

Example 6. Let us consider the estimation of 
the mean of a Gaussian distribution whose vari- 
ance fj^ is known: we suppose that the true mean 
is a, while the parameter space is {—a, a}, where a 
is known. The maximum likelihood estimator ^ 
takes the value —a if the sample mean takes on its 
value in (— oo,0) and a if it falls in [0, -|-oo) (the 
position of is a convention). Therefore: 

P,„(r/^o)=IPeo(^" = -a) 

rO g-(5-a)2/(2aVn) 
J —00 
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oo V 27r 

'na 



a 



= — ^= l + O - 

V27rn a \ \n 

where we have used Problem 1 on page 193 in [22]. 
Proposition 5 ahows also for recovering the right 
convergence rate. Indeed, we have 

P,o(r/«)=Peo(^" = -«) 

= ^ --(1 + 0(1)). 

\/2irn a 

On the other hand, the lower bound of Proposition 6 
yields 

1 In?' 
hm -lnP,„(r/^o)> ^, 

and the lower bound of Proposition 7 yields 

1 - o? 

liminf- sup lnP0„(r /0o)>-7^^• 
Therefore, the MLE asymptotically attains the min- 
imax lower bound but not the classical one. 

In the following, we will show that estimators can 
be pointwise more efficient than the MLE; consider 
the estimator defined by 

^0 if U(0o)>U(^i) + fc-n, 
Q\ else. 



When A; = 0, 6i"(/c) coincides with the MLE 61". Then, 
the behavior of the estimator is characterized by the 
probabilities: 

F,„r(^)=^o) = v''-^-^'+'«'-^^ 



2acr-^/n 
k-n - cP' — 2o? ■ n 



We have (weak) consistency if 



The risk TZi{0^{k),6Q) under is then 

k-a'^ + 2a^ 



9„(r(A;)/^o) = <I> 



2aa 



this can be made smaller than the probability of 
error of the MLE simply taking k> 0, thus implying 
that the MLE is not pointwise efficient. 



Now, we show that this estimator cannot converge 
faster than the Chapman-Robbins lower bound with- 
out losing its consistency. Indeed, FQ^^{9^{k) ^ 6q) is 
smaller than the Chapman-Robbins lower bound if 

2 / \ 4 
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a 



a 



>0, 



and this is never true under (11). If this estimator 
is pointwise more efficient than the MLE under 9q, 
then its risk under 6i is given by 



Pe,(0~"(fc)/^i) = cD 



k 



a 



2a' 



2aa 



and this is greater than for the MLE. This shows that 
a faster convergence rate can be obtained in some 
points, the price to pay being a worse convergence 
rate elsewhere in O. 

4.2.1 Optiniality w.r.t. classes of estimators In the 
following section, we show some optimality proper- 
ties of Bayes and ML estimators. We start with an 
important and well-known fact. 

Proposition 8. Under A7, A8, A9 and AlO, 
the Bayes risk ri(^",7r) (under the zero-one loss 
function) associated with a prior distribution vr is 
strictly minimized by the posterior mode correspond- 
ing to the prior tt, for any finite n. 

The following proposition shows that the MLE is 
admissible and minimax efficient under the zero- 
one loss and minimizes the average probability of 
error. It implies that estimators that are more effi- 
cient than the MLE at a certain point Oq^Q are less 
efficient in at least another point G 0. As a result, 
estimators can be more efficient than minimax effi- 
cient ones only on portions of the parameter space, 
but are then strictly less efficient elsewhere. 

Proposition 9. Under Assumptions A7, AS 
and A9, the MLE is admissible and minimax effi- 
cient w.r.t. the class of all estimators and w.r.t. TZi 
and minimizes the average probability of error Pg. 

4.2.2 Optimality w.r.t. the information inequali- 
ties In this subsection, we will show that the MLE 
does not attain the Chapman-Robbins lower bound 
in the form of Proposition 6 but that it attains the 
minimax form of Proposition 7 and that efficiency 
and minimax efficiency are generally incompatible. 

Therefore, the situation described in Example 6 is 
general, for it is possible to show that the MLE is 
generally inefficient with respect to the lower bounds 
exposed in Proposition 6. 
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Proposition 10. Under Assumptions A7, A8 
and A9/ 

(i) the MLE is not asymptotically CR- 
efficient w.r.t. 1Z\ at 6q; 

(ii) the MLE is asymptotically minimax Co- 
efficient w.r.t. TZi; 

(iii) an estimator that is asymptotically CR- 
efficient w.r.t. IZi at 9q is not asymptotically mini- 
max CR- efficient w.r.t. TZi. 

Remark 1 1 . The assumption of homogeneity of 
the probabihty measures, necessary to derive (ii), 
can be removed in the proof of (i) along the hues 
of [45]. 

4.2.3 The evil of superefficiency Ever since it was 
discovered by Hodges, the problem of superefficiency 
has been dealt with extensively in regular statis- 
tical problems (see, e.g., [55, 85]). However, these 
proofs do not transpose to discrete parameter es- 
timation problems, since they are mostly based on 
the equivalence of prior probability measures with 
the Lebesgue measure and on properties of Bayes 
estimators that do not hold in this case. Moreover, 
the discussion of the previous sections has shown 
that, in discrete parameter problems, CR-efficiency 
and efficiency with respect to a class of estimators do 
not coincide. The following proposition yields a so- 
lution to the superefficiency problem. 



Proposition 11. 
and A9.' 



Under Assumptions A7, AS 



(i) no estimator 6"" is asymptotically CK-super- 
efficient w.r.t. TZi at Oq^Q; 

(ii) no estimator 9^ is superefficient w.r.t. the 
MLE and IZi. 

4.3 Alternative Risk Functions 

Now we consider in what measure the previous 
results transpose when changing the risk function. 
Following [33], we first consider the quadratic cost 
function and the corresponding risk function: 

C2{r,eo) = {r-e^)\ 
7^2(r,eo) = MSE(r). 

The cost function Ci has the drawback of weighting 
in the same way points of the parameter space that 
lie at different distances with respect to the true 
value ^0- In many cases, a more general loss function 
can be considered, as suggested in [30] (Volume 1, 
page 51) for multiple tests: 







if 9"' 
if 9'- 



^0, 

9., 



where aj{6o) > for j = 1, . . . , J can be tuned in or- 
der to give more or less weight to different points of 
the parameter space. The risk function is therefore 
given by the weighted probability of misclassifica- 

tion 7^3(e~^ ^o) = E/=i aj(^o) • P^oi^^" = 
It is trivial to remark that 

lim -ln7^2(r,^o) 

n— >oo n 



■ lim — In I 

n— >oo n 
1 



00 1 



liminf-ln sup 7^2(r,0o) 

= hm inf - In sup ¥g., (^" ^9o), 

and the lower bounds of Propositions 6 and 7 hold 
also in this case. The same equalities hold also for TZ^. 
As a result, Proposition 10 and Proposition ll(i) ap- 
ply also to these risk functions. 

On the other hand, as concerns Proposition 9 and 
Proposition ll(ii), it is simple to show that with re- 
spect to the risk functions 7^2 (^"i ^o) and TZ^^O^, 9^), 
the results hold only asymptotically (see [46], for 
asymptotic minimax efficiency of the estimator of 
the integral mean of a Gaussian sample with known 
variance) . 

5. PROOFS 

Proof of Proposition 1. Under Al, Kolmogo- 
rov's SLLN implies that Po-a.s. ^ Y17=i lng(^i; 9j) 
E,Qliiq{Y;9j), and for Po-a.s. any sequence of real- 
izations, 9^ converges to ^o- Measurability follows 
from the fact that the following set belongs to 3^®": 



to £Q 



1 

sup- V'lng(yi;6') <t 
di&Q I i=l 



□ 



Proof of Lemma 1. Clearly (ii) implies A2 for 
a certain > 0. On the other hand, suppose that A2 
holds; then, applying recursively Holder inequality: 

A, 



A«(A) =lnEc 



n 

j=0,...,J,j^i 



1 



i=o,...,jjyi 



q{Y;9i 
q{Y;9, 

q{Y;9i) 



and choosing the Aj's adequately, we get (ii). □ 
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Proof of Proposition 2. The first two results 
are straightforward applications of Cramer's Theo- 
rem in (see, e.g., [21], Corollary 6.1.6, page 253). 
Indeed, it is known that the lower bound holds with- 
out any supplementary assumption, while the upper 
bound requires a Cramer condition G int(D^(i)); 
indeed, from Lemma 1, this is equivalent to Assump- 
tion A2. Then, a full LDP holds: 

liminf ilnPo(^" = 6li) 

n— s-oo n 

>- inf sup{(y,A)-A«(A)}, 

y6mtR:[ X(zM.J 

limsup-lnPo(^" = 6'i) 

n— >oo IT' 

<- inf sup{(y,A)-A»(A)}. 

In order to prove the final result, we have to show 
that M.'^ is a A^^^'*- continuity set, that is, 

infyeintR^ ^^*^'*(y) = infyeMJ A^'^'*(y)- It is enough 
to apply part (ii) in Lemma on page 903 of [66] . □ 

Proof of Proposition 3. First of all, we note 
that Po(^'' / ^o) = IPo(Efc=i e int(M:[;)'=). There- 
fore, we can apply large deviations principles, with 
the candidate rate function A*(y); this is a strictly 
convex function on intPA* globally minimized at 

y' = [Eo{lnq{Y;eo) -lnq{Y;ej))].^,_j. 

By Assumption Al, y' is finite and belongs to int M:j^. 
From the strict convexity of the level sets of A* (y ) , 



the set arginf. 



yeint(R:[)' 



A* (y) has at most finite car- 
dinality H. Moreover, since large deviations the- 
ory allows us to ignore the part of int(M![^)'^ where 
A*(y) > e + infygi„t(^j)c A*(y), we can replace (R'l)^ 
with a collection of H disjoint sets, say F/j, h = 
1, . . . ,H, each of them containing in its interior one 
and only one of the points of arginfygj^^^^j^j^c A*(y) 

(see [40], page 508): 



^o(X]X;tGmt( 

\fc=i 



H 



(12) 



(l + o(l))-Po(X]Xfceint|jF/, 

k=l h=l / 

/ n N 

^0 J^XfeGintF/, 



(1 + «(!)) 



H 

h=l 



\k=l 



As before, the bounds derive from Cramer's Theo- 
rem in M*^. Noting that the contribution of any F/j is 
the same and recalling (12), we get the results. □ 



Proof of Proposition 4. The assumptions of 
the theorem on page 904 of [66] are easily verified. 
This shows that a unique dominating point y*-*^ ex- 
ists and implies, through Proposition on page 161 
of [65] (according to the "Remarks on the hypothe- 
ses" in [66], page 905, the "lattice" conditions are 
not necessary), that the stated bracketing of Po(^" = 
Oi) holds. □ 

Proof of Proposition 5. Under Assumptions Al, 
A2, A3 and A4, according to Proposition 2(iii) we 
have PoiQuiOi) > QniOo)} = MQn{Oi) > Qni9o)} ■ 
(1 + o(l)) and we can study the behavior of 

Po(^" / ^O) = lPo(^" = ^l) = Po{Qn(^l) > QniOo)} 
= P0{Qn(^l)-Qn(^0)G[0,+Oo)}. 

Assumption A8 implies that the conditions of Theo- 
rem 3.7.4 in [21] (page 110) are verified, in particular 
the existence of a positive /i € int(P^(i)) solution to 
the equation = (A*^^-* )'(//). From Lemma 2.2.5(c) 
in [21], this implies A^^\fi) = -A(^)'*(0), and the re- 
sult follows. □ 

Proof of Theorem 1. We note that the func- 
tion k(-) in [42] (page 1117) is given by 

k(u) = lnEoexp[u- (XW -EoX(*))] 

= lnEoexp[u • X^'^] - u • EqX^^) 

= A(')(u)-u-EoX('). 

Therefore, we write the mean m(u) and covariance 
matrix V(u) as 

, dK(u) 9A(')(u) 
m(u) = k\u) = = - EoX(*), 



V(u) = ^"(u) 



du du 

d'^K{u) _ a2A»(u) 



From (2), we have 
Po(r = 0i) 

= lPo(X;X«Gint 
\fc=i 




1 " 

- • ^(X.^^ - EoX«) G int(M^) e EqX 



k=l 



Now we verify Assumptions (S.1)-(S.4) of [42]. As- 
sumption (S.l) is implied by A2. Assumptions (S.2) 
and (S.3) hold since the random vectors are i.i.d. 
and nontrivial. At last, (S.4) is implied by A5 (see. 
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e.g., [72], page 735). Since EqX^*) is strictly negative 
by Al, intM:|^ QEqX^*) does not contain and, ac- 
cording to Theorem 1 in [42] (page 1118), the resuh 
of the theorem follows. □ 

Proof of Proposition 6. First of ah, we 
prove (5). We suppose that 

In {^^^'^^| /y(j/;6'i)/i(dj/) < oo; 
jY{y;Oo) 

otherwise the inequality is trivial. Then, for any 
9i € Q\{6o}, we apply Lemma 3.4.7 in [21] {page 94) 
with an = FeJ^" / Oi} and /?„ = PeJ0" / ^o}; 
since 9^ is strongly consistent, a„ is ultimately less 
than any e > and the bound holds. 

The second part can be proved as follows. Define 
the sets 



^nU) = {u}:t 



BnU) 



uj: — In 

n 

<Efl.ln 



fviX-eo) 



+ e 



Therefore, we have 

= Ee„i{r/0o} 



:E. 



>E. 



fviy 



fviY 
fviY 



fviY 



%) 



l{e~VM 

l{^n(j)} 



>E,l{A„(j)}l{i?„(i)} 

fY{Y-e, 



-n ■ 



■ exp 

■ exp 



-n ■ 



fY{Y;0j 



■ exp< — n • 



+ e 



+ e 



+ e 



This implies: 

liminf-lnPgjr /^ol 
n^oo n 



> -Ee ln 



fY{Y;e,) 
fviY- 6^) 



lim inf — ln[l 

n— >oo n 



Now, since lim,i_^oo {-B^(i)} = and 
limsup„_^o^F0j.{^" j^9j} <l, the third term in the 
right-hand side goes to zero; since e is arbitrary, the 
result follows. □ 



Proof of Proposition 7. 
Pearson Lemma, we have 



From the Neyman- 



sup ( 

6*0 ee 

> max{Pg 
1 



'oj 



>-.{p,„(r/eo) + 



V^o),Pei(^~V^i)} 

r/^i)} 



1 

> - 

- 2 



Ln{9l) J '"\ln{9l) 



<1 + 



> 1 



for an arbitrary couple of different alternatives 
and 9i in 0. Then we can use Chernoff's Bound 
([21], page 93); the final expression derives from the 
equality A*(0) = - inf^gR A(A). □ 

Proof of Proposition 9. In order to prove 
that the MLE is admissible and minimax we use the 
Bayesian method. Using the prior densities given by 
n{9k) = ( J + 1)~^, the Bayes estimator relative to 
zero-one loss coincides with the MLE 0". There- 
fore, respectively from Lemma 2.10 and Proposi- 
tion 6.3 in [71], 0" is minimax and admissible. The 
fact that the MLE minimizes the average probability 
of error derives from Proposition 8. □ 

Proof of Proposition 10. (i) In order to prove 
the first statement, we apply Lemma 2.4 in [45] 
(page 653). Clearly V is closed in total variation, 
since it is finite, and is not exponentially convex; 
indeed, under Assumption A7, there exist ^1,^2 £ © 
and a £ [0, 1] , such that the probability measure P6)(q) 
defined as 

(/ei(x))--(/,,(x))i-" 



Pe{a)(dx) 



-/i(dx) 



does not belong to V. Therefore, from Lemma 2.4(iii) 
in [45], there exist 0^, ^2 ^ B such that Equation (2.12) 
in [45] holds and, as a consequence of Lemma 2.4(i) 
in [45], the MLE fails to be an inaccuracy rate op- 
timal estimator at least at one of the points , ^2 • 
This means that, say for 0^: 

^■inp,,{|r-e;i>e} 



lim inf ■ 



fY{Y;9[) 



> sup Eg In 

6»e0,|e-6»J|>e V JY(i ,f) 

and this implies that the Chapman-Robbins bound 
is not attained at 9'i. 
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(ii) The second statement follows easily from the 
results of [43] (Theorem 2) on lim„^oo ^ In'^il^"; ^r)) 
using Equation (8). Indeed, the MLE attains the 
lower bound (7) and is therefore asymptotically min- 
imax efficient. 

(iii) If the estimator is asymptotically CR-efficient 
w.r.t. TZi at this means that at Oq it is more 
efficient than the MLE and therefore it has to be 
less efficient elsewhere (since from Proposition 9 the 
MLE minimizes the probability of error). Therefore, 
it cannot be minimax CR-efficient. □ 

Proof of Proposition 11. For (i) it is enough 
to follow the proof of Proposition 6 and to reason 
by contradiction, while (ii) is simply another way of 
stating Proposition 9. □ 
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